{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "05 - Extractive QA with txtai",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vwELCooy4ljr"
      },
      "source": [
        "# Extractive QA with txtai\n",
        "\n",
        "In Parts 1 through 4, we gave a general overview of txtai, the backing technology and examples of how to use it for similarity searches. This notebook builds on that and extends to building extractive question-answering systems."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ew7orE2O441o"
      },
      "source": [
        "# Install dependencies\n",
        "\n",
        "Install `txtai` and all dependencies."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "LPQTb25tASIG"
      },
      "source": [
        "%%capture\n",
        "!pip install git+https://github.com/neuml/txtai"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_YnqorRKAbLu"
      },
      "source": [
        "# Create an Embeddings and Extractor instances\n",
        "\n",
        "The Embeddings instance is the main entrypoint for txtai. An Embeddings instance defines the method used to tokenize and convert a segment of text into an embeddings vector.\n",
        "\n",
        "The Extractor instance is the entrypoint for extractive question-answering.\n",
        "\n",
        "Both the Embeddings and Extractor instances take a path to a transformer model. Any model on the [Hugging Face model hub](https://huggingface.co/models) can be used in place of the models below."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "OUc9gqTyAYnm"
      },
      "source": [
        "%%capture\n",
        "\n",
        "from txtai.embeddings import Embeddings\n",
        "from txtai.extractor import Extractor\n",
        "\n",
        "# Create embeddings model, backed by sentence-transformers & transformers\n",
        "embeddings = Embeddings({\"path\": \"sentence-transformers/nli-mpnet-base-v2\"})\n",
        "\n",
        "# Create extractor instance\n",
        "extractor = Extractor(embeddings, \"distilbert-base-cased-distilled-squad\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "4X5z3UjnAGe7",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "131cfed3-e6f0-4cee-dc66-f242cd1a5bf7"
      },
      "source": [
        "data = [\"Giants hit 3 HRs to down Dodgers\",\n",
        "        \"Giants 5 Dodgers 4 final\",\n",
        "        \"Dodgers drop Game 2 against the Giants, 5-4\",\n",
        "        \"Blue Jays beat Red Sox final score 2-1\",\n",
        "        \"Red Sox lost to the Blue Jays, 2-1\",\n",
        "        \"Blue Jays at Red Sox is over. Score: 2-1\",\n",
        "        \"Phillies win over the Braves, 5-0\",\n",
        "        \"Phillies 5 Braves 0 final\",\n",
        "        \"Final: Braves lose to the Phillies in the series opener, 5-0\",\n",
        "        \"Lightning goaltender pulled, lose to Flyers 4-1\",\n",
        "        \"Flyers 4 Lightning 1 final\",\n",
        "        \"Flyers win 4-1\"]\n",
        "\n",
        "questions = [\"What team won the game?\", \"What was score?\"]\n",
        "\n",
        "execute = lambda query: extractor([(question, query, question, False) for question in questions], data)\n",
        "\n",
        "for query in [\"Red Sox - Blue Jays\", \"Phillies - Braves\", \"Dodgers - Giants\", \"Flyers - Lightning\"]:\n",
        "    print(\"----\", query, \"----\")\n",
        "    for answer in execute(query):\n",
        "        print(answer)\n",
        "    print()\n",
        "\n",
        "# Ad-hoc questions\n",
        "question = \"What hockey team won?\"\n",
        "\n",
        "print(\"----\", question, \"----\")\n",
        "print(extractor([(question, question, question, False)], data))"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "---- Red Sox - Blue Jays ----\n",
            "('What team won the game?', 'Blue Jays')\n",
            "('What was score?', '2-1')\n",
            "\n",
            "---- Phillies - Braves ----\n",
            "('What team won the game?', 'Phillies')\n",
            "('What was score?', '5-0')\n",
            "\n",
            "---- Dodgers - Giants ----\n",
            "('What team won the game?', 'Giants')\n",
            "('What was score?', '5-4')\n",
            "\n",
            "---- Flyers - Lightning ----\n",
            "('What team won the game?', 'Flyers')\n",
            "('What was score?', '4-1')\n",
            "\n",
            "---- What hockey team won? ----\n",
            "[('What hockey team won?', 'Flyers')]\n"
          ],
          "name": "stdout"
        }
      ]
    }
  ]
}